Efficient Determination of Dynamic Split Points in a Decision Tree

نویسندگان

  • David Maxwell Chickering
  • Christopher Meek
  • Robert Rounthwaite
چکیده

We consider the problem of choosing split points for continuous predictor variables in a decision tree. Previous approaches to this problem typically either (1) discretize the continuous predictor values prior to learning or (2) apply a dynamic method that considers all possible split points for each potential split. In this paper, we describe a number of alternative approaches that generate a small number of candidate split points dynamically with little overhead. We argue that these approaches are preferable to pre-discretization, and provide experimental evidence that they yield probabilistic decision trees with the same prediction accuracy as the traditional dynamic approach. Furthermore, because the time to grow a decision tree is proportional to the number of split points evaluated, our approach is significantly faster than the traditional dynamic

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Different Methods of Decision Tree Algorithm for Mapping Rangeland Using Satellite Imagery (Case Study: Doviraj Catchment in Ilam Province)

Using satellite imagery for the study of Earth's resources is attended by manyresearchers. In fact, the various phenomena have different spectral response inelectromagnetic radiation. One major application of satellite data is the classification ofland cover. In recent years, a number of classification algorithms have been developed forclassification of remote sensing data. One of the most nota...

متن کامل

Determining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department

Background: Length of stay is one of the most important indicators in assessing hospital performance. A shorter stay can reduce the costs per discharge and shift care from inpatient to less expensive post-acute settings. It can lead to a greater readmission rate, better resource management, and more efficient services. Objective: This study aimed to ident...

متن کامل

An Optimal Algorithm for Closest-Pair Maintenance

Given a set S of n points in k-dimensional space, and an L t metric, the dynamic closest pair problem is deened as follows: nd a closest pair of S after each update of S (the insertion or the deletion of a point). For xed dimension k and xed metric L t , we give a data structure of size O(n) that maintains a closest pair of S in O(log n) time per insertion and deletion. The running time of algo...

متن کامل

Temperature Prediction in Electric Arc Furnace with Neural Network Tree

This paper presents a neural network tree regression system with dynamic optimization of input variable transformations and post-training optimization. The decision tree consists of MLP neural networks, which optimize the split points and at the leaf level predict final outputs. The system is designed for regression problems of big and complex datasets. It was applied to the problem of steel te...

متن کامل

CC-SLIQ: Performance Enhancement with 2k Split Points in SLIQ Decision Tree Algorithm

Decision trees have been found to be very effective for classification in the emerging field of data mining. This paper proposes a new method: CC-SLIQ (Cascading Clustering and Supervised Learning In Quest) to improve the performance of the SLIQ decision tree algorithm. The drawback of the SLIQ algorithm is that in order to decide which attribute is to be split at each node, a large number of G...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001